Goto

Collaborating Authors

 feature importance score


Fair Feature Importance Scores via Feature Occlusion and Permutation

Little, Camille, Navarro, Madeline, Segarra, Santiago, Allen, Genevera

arXiv.org Machine Learning

As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.



Sobolev Independence Criterion

Neural Information Processing Systems

We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y. SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random variables and the product of their marginals. We use sparsity inducing gradient penalties to promote input sparsity of the critic of the IPM. In the kernel version we show that SIC can be cast as a convex optimization problem by introducing auxiliary variables that play an important role in feature selection as they are normalized feature importance scores. We then present a neural version of SIC where the critic is parameterized as a homogeneous neural network, improving its representation power as well as its interpretability. We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic.


Towards Automatic Concept-based Explanations

Neural Information Processing Systems

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are important for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts. Our systematic experiments demonstrate that \alg discovers concepts that are human-meaningful, coherent and important for the neural network's predictions.


Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index

Huang, Xuanxiang, Létoffé, Olivier, Marques-Silva, Joao

arXiv.org Artificial Intelligence

Feature attribution methods based on game theory are ubiquitous in the field of eXplainable Artificial Intelligence (XAI). Recent works proposed rigorous feature attribution using logic-based explanations, specifically targeting high-stakes uses of machine learning (ML) models. Typically, such works exploit weak abductive explanation (WAXp) as the characteristic function to assign importance to features. However, one possible downside is that the contribution of non-WAXp sets is neglected. In fact, non-WAXp sets can also convey important information, because of the relationship between formal explanations (XPs) and adversarial examples (AExs). Accordingly, this paper leverages Shapley value and Banzhaf index to devise two novel feature importance scores. We take into account non-WAXp sets when computing feature contribution, and the novel scores quantify how effective each feature is at excluding AExs. Furthermore, the paper identifies properties and studies the computational complexity of the proposed scores.


Decorrelated feature importance from local sample weighting

Fröhlich, Benedikt, Durst, Alison, Behr, Merle

arXiv.org Machine Learning

Feature importance (FI) statistics provide a prominent and valuable method of insight into the decision process of machine learning (ML) models, but their effectiveness has well-known limitations when correlation is present among the features in the training data. In this case, the FI often tends to be distributed among all features which are in correlation with the response-generating signal features. Even worse, if multiple signal features are in strong correlation with a noise feature, while being only modestly correlated with one another, this can result in a noise feature having a distinctly larger FI score than any signal feature. Here we propose local sample weighting (losaw) which can flexibly be integrated into many ML algorithms to improve FI scores in the presence of feature correlation in the training data. Our approach is motivated from inverse probability weighting in causal inference and locally, within the ML model, uses a sample weighting scheme to decorrelate a target feature from the remaining features. This reduces model bias locally, whenever the effect of a potential signal feature is evaluated and compared to others. Moreover, losaw comes with a natural tuning parameter, the minimum effective sample size of the weighted population, which corresponds to an interpretation-prediction-tradeoff, analog to a bias-variance-tradeoff as for classical ML tuning parameters. We demonstrate how losaw can be integrated within decision tree-based ML methods and within mini-batch training of neural networks. We investigate losaw for random forest and convolutional neural networks in a simulation study on settings showing diverse correlation patterns. We found that losaw improves FI consistently. Moreover, it often improves prediction accuracy for out-of-distribution, while maintaining a similar accuracy for in-distribution test data.


PLEX: Perturbation-free Local Explanations for LLM-Based Text Classification

Rahulamathavan, Yogachandran, Farooq, Misbah, De Silva, Varuna

arXiv.org Artificial Intelligence

--Large Language Models (LLMs) excel in text classification, but their complexity hinders interpretability, making it difficult to understand the reasoning behind their predictions. Explainable AI (XAI) methods like LIME and SHAP offer local explanations by identifying influential words, but they rely on computationally expensive perturbations. These methods typically generate thousands of perturbed sentences and perform inferences on each, incurring a substantial computational burden, especially with LLMs. T o address this, we propose P erturbation-free L ocal Ex planation (PLEX), a novel method that leverages the contextual embeddings extracted from the LLM and a "Siamese network" style neural network trained to align with feature importance scores. This one-off training eliminates the need for subsequent perturbations, enabling efficient explanations for any new sentence. We demonstrate PLEX's effectiveness on four different classification tasks (sentiment, fake news, fake COVID-19 news and depression), showing more than 92% agreement with LIME and SHAP . Our evaluation using a "stress test" reveals that PLEX accurately identifies influential words, leading to a similar decline in classification accuracy as observed with LIME and SHAP when these words are removed. Notably, in some cases, PLEX demonstrates superior performance in capturing the impact of key features. PLEX dramatically accelerates explanation, reducing time and computational overhead by two and four orders of magnitude, respectively. This work offers a promising solution for explainable LLM-based text classification. ARGE language models (LLMs) have significantly advanced text classification, achieving state-of-the-art results in tasks like emotion recognition, sentiment analysis, topic categorization, and spam detection [1]. Powered by transformer architectures with millions or billions of parameters, they effectively capture complex linguistic patterns. However, the very complexity that enables their high performance also renders their internal workings opaque and difficult to interpret.


TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores

Foote, Aaron, Krizanc, Danny

arXiv.org Machine Learning

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex model such as a random forest, such importances are not innate -- as they are, e.g., with linear regression. Efficient methods have been created to provide such capabilities, with one of the most popular among them being permutation feature importance due to its efficiency, model-agnostic nature, and perceived intuitiveness. However, permutation feature importance has been shown to be misleading in the presence of dependent features as a result of the creation of unrealistic observations when permuting the dependent features. In this work, we develop TRIP (Test for Reliable Interpretation via Permutation), a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores that are the result of model extrapolation. To build on this, we demonstrate how the test can be complemented in order to allow its use in high dimensional settings. Through testing on simulated data and applications, our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.


Enhancing interpretability of rule-based classifiers through feature graphs

Sirocchi, Christel, Verda, Damiano

arXiv.org Artificial Intelligence

In domains where transparency and trustworthiness are crucial, such as healthcare, rule-based systems are widely used and often preferred over black-box models for decision support systems due to their inherent interpretability. However, as rule-based models grow complex, discerning crucial features, understanding their interactions, and comparing feature contributions across different rule sets becomes challenging. To address this, we propose a comprehensive framework for estimating feature contributions in rule-based systems, introducing a graph-based feature visualisation strategy, a novel feature importance metric agnostic to rule-based predictors, and a distance metric for comparing rule sets based on feature contributions. By experimenting on two clinical datasets and four rule-based methods (decision trees, logic learning machines, association rules, and neural networks with rule extraction), we showcase our method's capability to uncover novel insights on the combined predictive value of clinical features, both at the dataset and class-specific levels. These insights can aid in identifying new risk factors, signature genes, and potential biomarkers, and determining the subset of patient information that should be prioritised to enhance diagnostic accuracy. Comparative analysis of the proposed feature importance score with state-of-the-art methods on 15 public benchmarks demonstrates competitive performance and superior robustness.


Towards Automatic Concept-based Explanations

Neural Information Processing Systems

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are important for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts.